Skip to content

feat(api): poll power state for PowerShelf in Ready#1878

Merged
vinodchitraliNVIDIA merged 1 commit into
NVIDIA:mainfrom
vinodchitraliNVIDIA:vc/power_state
May 30, 2026
Merged

feat(api): poll power state for PowerShelf in Ready#1878
vinodchitraliNVIDIA merged 1 commit into
NVIDIA:mainfrom
vinodchitraliNVIDIA:vc/power_state

Conversation

@vinodchitraliNVIDIA
Copy link
Copy Markdown
Contributor

While a PowerShelf idles in Ready, perform a best-effort GetPowerStateByDeviceList RPC against RMS and persist the observed pstate to power_shelves.status. The lookup mirrors the NodeSet-with-inline-BMC-endpoint shape used by the Maintenance handler's SetPowerStateByDeviceList path, so build_power_shelf_node_info is bumped to pub(super) for reuse. Missing prerequisites (no RMS client, no rack association, no BMC details, no credentials) and transport / status failures are logged but never transition the controller out of Ready, so a transient RMS outage cannot bounce the shelf into Error. Also bumps librms from v0.0.12-rc1 to v0.0.12-rc4 (adds the prost-types dependency and the new GetPowerStateByDeviceList, firmware-object management, and SetScaleUpFabricState RPCs), and updates the in-tree mock RmsApi implementations to cover the new trait methods.

Description

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@vinodchitraliNVIDIA vinodchitraliNVIDIA requested a review from a team as a code owner May 21, 2026 22:32
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@zhaozhongn
Copy link
Copy Markdown
Contributor

Is it possible thread the new RMS APIs through the Powershelf Manager interface and call it from state machine via indirection? We should avoid adding direct RMS calls from state machine whenever possible.

Copy link
Copy Markdown
Contributor

@Matthias247 Matthias247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments about further possible improvements. But looks good to get merged

Comment thread crates/api/src/state_controller/power_shelf/ready.rs Outdated
Comment thread crates/power-shelf-controller/src/ready.rs
Comment thread crates/power-shelf-controller/src/ready.rs
@vinodchitraliNVIDIA
Copy link
Copy Markdown
Contributor Author

Is it possible thread the new RMS APIs through the Powershelf Manager interface and call it from state machine via indirection? We should avoid adding direct RMS calls from state machine whenever possible.

I have follow up change for CM wiring

@vinodchitraliNVIDIA
Copy link
Copy Markdown
Contributor Author

Is it possible thread the new RMS APIs through the Powershelf Manager interface and call it from state machine via indirection? We should avoid adding direct RMS calls from state machine whenever possible.

#1953

@amit-pabalkar
Copy link
Copy Markdown

Is it possible thread the new RMS APIs through the Powershelf Manager interface and call it from state machine via indirection? We should avoid adding direct RMS calls from state machine whenever possible.

@zhaozhongn will make sure this gets addressed in follow up PR

Copy link
Copy Markdown

@amit-pabalkar amit-pabalkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@vinodchitraliNVIDIA vinodchitraliNVIDIA force-pushed the vc/power_state branch 5 times, most recently from 90e10ec to 463ea31 Compare May 30, 2026 06:19
While a PowerShelf idles in Ready, perform a best-effort
GetPowerStateByDeviceList RPC against RMS and persist
the observed pstate to power_shelves.status. The lookup
mirrors the NodeSet-with-inline-BMC-endpoint shape used
by the Maintenance handler's SetPowerStateByDeviceList path,
so build_power_shelf_node_info is bumped to pub(super)
for reuse. Missing prerequisites (no RMS client, no rack
association, no BMC details, no credentials) and
transport / status failures are logged but never transition
the controller out of Ready, so a transient RMS outage
cannot bounce the shelf into Error. Also bumps librms
from v0.0.12-rc1 to v0.0.12-rc4 (adds the prost-types
dependency and the new GetPowerStateByDeviceList, firmware-object
management, and SetScaleUpFabricState RPCs),
and updates the in-tree mock RmsApi implementations
to cover the new trait methods.
@vinodchitraliNVIDIA vinodchitraliNVIDIA merged commit c229f0b into NVIDIA:main May 30, 2026
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants